Search CORE

83 research outputs found

Building a coreference-annotated corpus from the domain of biochemistry

Author: Batista-Navarro Riza Theresa
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/06/2011
Field of study

The University of Manchester - Institutional Repository

Analysing Entity Type Variation across Biomedical Subdomains

Author: Ananiadou Sophia
Batista-Navarro Riza Theresa
Mihaila Claudiu
Publication venue
Publication date: 26/05/2012
Field of study

The University of Manchester - Institutional Repository

IoT Cooking Workflows for End Users:A Comparison Between Behaviour Trees and the DX-MAN Model

Author: Arellanes Damian
Batista-Navarro Riza Theresa
Clinch Sarah
Ventirozos Filippos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/08/2021
Field of study

The University of Manchester - Institutional Repository

Lancaster E-Prints

Identification of research hypotheses and new knowledge from scientific literature

Author: Ananiadou Sophia
Batista-Navarro Riza Theresa
Mcnaught John
Natural Language Processing and Text Mining
Raheel Nawaz
Shardlow Matthew
Thompson Paul
Publication venue: BioMed Central
Publication date: 01/06/2018
Field of study

Background: Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events , e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author’s intended knowledge gain) and New Knowledge (an author’s findings). The method incorporates various features, including a combination of simple MK dimensions. Methods: We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated. Results: We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836). Conclusion: We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications

E-space: Manchester Metropolitan University's Research Repository

ZENODO

Directory of Open Access Journals

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

The University of Manchester - Institutional Repository

Text Mining the History of Medicine

Author: A Henriksson
AR Aronson
C Mihăilă
Carsten Timmermann
D Lopresti
D McClosky
Elizabeth Toon
G Hripcsak
G Schneider
Georgios Kontonatsios
H Moen
H Suominen
J Cohen
J-D Kim
Jacob Carter
John McNaught
JR Firth
K Bontcheva
KB Wagholikar
L Kelly
LM Schriml
Luis M. Rocha
M Miwa
M Miwa
M Ruiz-Casado
M Worboys
MA Hearst
Michael Worboys
N Alnazzawi
O Bodenreider
P Murrieta-Flores
P Thompson
Paul Thompson
R Prasad
RI Dogan
Riza Theresa Batista-Navarro
S Jonnalagadda
S Pyysalo
S Zhang
Sophia Ananiadou
T Hitchcock
TH Tanner
Y Tsuruoka
Y Tsuruoka
Y Tsuruoka
Y Wang
Z Liu
ZS Harris
Ö Uzuner
Ö Uzuner
Ö Uzuner
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/01/2016
Field of study

Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform

Crossref

Directory of Open Access Journals

Edge Hill University Research Information Repository

PubMed Central

The University of Manchester - Institutional Repository

The CHEMDNER corpus of chemicals and drugs and its annotation principles

Author: Akhondi S.A. (Saber A.)
Alves R. (Rui)
An X. (Xin)
Ata C. (Caglar)
Bajec M. (Marko)
Batista-Navarro R.T. (Riza Theresa)
Campos D. (David)
Can T. (Tolga)
Choi M. (Miji)
Couto F.M. (Francisco M.)
Dai H.J (Hong-Jie)
Dieb T.M. (Thaer M.)
Ekbal A. (Asif)
Giles C.L. (C. Lee)
Huber T. (Torsten)
Irmer M. (Matthias)
Ji D. (Donghong)
Khabsa M. (Madian)
Kors J.A. (Jan A.)
Krallinger M. (Martin)
Lamurias A. (Andre)
Leaman R. (Robert)
Leitner F. (Florian)
Liu H. (Hongfang)
Lowe D.M. (Daniel M.)
Lu Y. (Yanan)
Lu Z. (Zhiyong)
Martínez P. (Paloma)
Matos S. (Sérgio)
Munkhdalai T. (Tsendsuren)
Nathan S. (Senthil)
Oyarzabal J. (Julen)
Rabal O. (Obdulia)
Rak R. (Rafal)
Ramanan S.V. (S.V.)
Ravikumar K.E. (Komandur Elayavilli)
Rocktäschel T. (Tim)
Ryu K.H. (Keun Ho)
Salgado D. (David)
Sayle R.A. (Roger A.)
Segura-Bedmar I. (Isabel)
Sikdar U.K. (Utpal Kumar)
Tang B. (Buzhou)
Tzong-Han-Tsai R. (Richard)
Usié A. (Anabel)
Valencia A. (Alfonso)
Vazquez M. (Miguel)
Verspoor K. (Karin)
Weber L. (Lutz)
Xu H. (Hua)
Xu S. (Shuo)
Yoshioka M. (Masaharu)
Zitnik S. (Slavko)
Publication venue: Chemistry Central
Publication date: 01/01/2015
Field of study

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus

Universidad de Navarra

Erasmus University Digital Repository

Dadun, University of Navarra